Random Effects Modeling for Design Variances in Multi-Fidelity Models

wdai

Motivation: Ordinal rankings of designs1

Ideas

  • Cluster of design performances;

  • Highly variable simulation variances;

Motivation: Simulation variance[^2]

[^2] Taken from class slides Introduction-S24.pdf.

Current multi-fidelity models formulation1

  • We assume that the performance of high-fidelity simulations \(Y_{xi}\) for design \(x\) follows a normal distribution with unknown mean \(\theta_x\) and known variance \(\sigma_x^2\):

    \[\begin{equation} Y_{xi} \mid \theta_x \sim \mathcal{N}(\theta_x, \sigma_x^2), \quad i = 1, \dots, l_x, \end{equation}\]

    where \(l_x\) is the number of high-fidelity simulations/repetitions for design \(x\).

  • This assumption is very strong and may not always hold in practice.

Motivation: Inadequate Assumption of Known Variances

  1. Unrealistic in real-world scenarios:

    • In many practical applications, the true variances of the designs are unknown and need to be estimated from the observed simulation outputs (maybe due to the complexity added to already sophisticated model).

    • Assuming known variances overlooks the uncertainty associated with the variance estimation process.

  2. Ignores the heterogeneity of variances across designs:

    • Different designs may exhibit different levels of variability due to factors such as complexity, sensitivity to input parameters, or inherent uncertainties in the simulation process.

    • Assuming a common known variance for all designs fails to capture the heterogeneity in the variance structure across the design space.

  1. Impacts the accuracy of ranking and selection:

    • The R & S of the best design are based on the comparison of the sample means and variances of the designs.

    • Incorrect assumptions about the variances can distort the ranking and lead to the selection of inferior designs, compromising the overall optimization process.

We could try to estimate the sample variances from the data, which would add another level of complexity to the model.

Proposed Solution: Treating Variances as Unknown and Modeling Them as Random Effects

What if we treat the variances \(\sigma_x^2\) as unknown and modeling them as random effects?

Overview: Random Effects Modeling

  • Mixed effects models, also known as hierarchical or multilevel models, are a powerful statistical framework for analyzing data with complex structures, such as repeated measures, nested designs, or clustered observations.

  • The general formulation of a linear mixed effects model is given by: \[ Y = \underbrace{X\beta}_{\text{fixed}} + \overbrace{Zb}^{\text{random}} + \epsilon \] where:

    • \(Y\) is the vector of observed responses
    • \(X\) is the design matrix for fixed effects
    • \(\beta\) is the vector of fixed effects parameters
    • \(Z\) is the design matrix for random effects
    • \(b\) is the vector of random effects
    • \(\epsilon\) is the vector of residual errors

Interpreting Random Effects

  • Random effects in mixed effects models capture the unobserved heterogeneity among individual units or groups within the population.

  • They allow for the estimation of unit-specific deviations from the population average, providing a more realistic representation of the variability in the data.

  • The estimation of random effects does introduce additional complexity in the model, requiring specialized estimation techniques such as restricted maximum likelihood (REML) or Bayesian methods.

  • Random effects modelings are popular in biomedical researches because of the ability to account for inter-subject variability and correlation among repeated measurements.

Random effects formulation in multi-fidelity models

Instead of assuming known variances, we introduce random effects to capture the heterogeneity in the variances across designs.

To introduce random effects for the variances, we assume that the variances \(\sigma_x^2\) come from a common distribution up to unknown parameters. Let \(\sigma_x^2\) follow a distribution \(D\) with parameters \(\phi\):

\[ \sigma_x^2 \sim D(\phi) \]

  • Some common choices include: Inverse Gamma Distribution:, Log-Normal Distribution:, and Half-Cauchy Distribution:, depending on the prior knowledge and the characteristics of the data.

Why Inverse Gamma Distribution?

  • If \(X \sim \text{Gamma}(\alpha, \beta)\), then \(1/X \sim \text{InvGamma}(\alpha, \beta)\).

  • The inverse-gamma distribution is widely used in Bayesian statistics as a prior distribution for variance parameters, particularly when working with models that assume normally distributed data (closed-form posterior).

  • The parameters \(\alpha\) and \(\beta\) are shape and scale parameters of the distribution. They can be interpreted as strength and scale of the prior belief about the variances.

Inverse Gamma Distribution

inv_gamma_plot

Random effects formulation

we can extend the formulation of the high-fidelity simulation outputs as follows:

\[\begin{equation} Y_{xi} = \underbrace{\theta_x}_{\text{fixed}} + \overbrace{\varepsilon_{xi}}^{\text{random}}, \quad \varepsilon_{xi} \sim \mathcal{N}(0, \sigma_x^2), \end{equation}\]

where \(Y_{xi}\) is the \(i\)-th simulation output for design \(x\), \(\theta_x\) is the true performance of design \(x\), and \(\varepsilon_{xi}\) is the random error term following a normal distribution with mean 0 and design-specific variance \(\sigma_x^2\).

We assume that the variances of the high-fidelity simulations, \(\sigma_x^2\), follow an inverse-gamma distribution with unknown shape parameter \(\alpha\) and scale parameter \(\beta\):

\[\begin{equation} \sigma_x^2 \sim \text{InvGamma}(\alpha, \beta), \quad \text{for } x = 1, \ldots, k, \end{equation}\]

where \(\alpha > 0\) and \(\beta > 0\) are the unknown parameters to be estimated from the data.

The pdf of the inverse-gamma distribution is given by:

\[\begin{equation} f(\sigma_x^2 \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right), \end{equation}\]

where \(\Gamma(\cdot)\) is the gamma function.

Interpretation of the variabilities

Random effects modeling provides us with the flexibility to let the data help determine the variability in the simulation outputs.

  • Case 1: If the data suggests that the variances are similar across all \(k\) designs, then the pdf of the inverse-gamma distribution will be relatively concentrated around a single value, indicating low variability among the designs. The estimated parameters \(\alpha\) and \(\beta\) will be such that the distribution has a small spread and a well-defined peak.

  • Case 2: If the data suggests that the variances are highly different across designs, then the pdf of the inverse-gamma distribution will be more spread out and may have a heavy tail (allowing for high variances for some design \(x\)). The estimated parameters \(\alpha\) and \(\beta\) will be such that the distribution has a larger spread and accommodates a wider range of variances.”

Posterior distribution of \(\sigma_x^2\)

We start by considering the likelihood of the observed data for a single design \(x\). Suppose we have \(l_x\) simulation outputs \(Y_x = (Y_{x1}, \ldots, Y_{xl_x})\) for design \(x\). Assuming that the simulation outputs are normally distributed with mean \(\theta_x\) and variance \(\sigma_x^2\), the likelihood function is given by:

\[\begin{equation} f(Y_x \mid \theta_x, \sigma_x^2) = \prod_{i=1}^{l_x} \frac{1}{\sqrt{2\pi\sigma_x^2}} \exp\left(-\frac{(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right). \end{equation}\]

We assume that the prior distribution of \(\sigma_x^2\) follows an inverse-gamma distribution with parameters \(\alpha\) and \(\beta\):

\[\begin{equation} f(\sigma_x^2 \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right). \end{equation}\]

To obtain the posterior distribution of \(\sigma_x^2\), we apply Bayes’ theorem, i.e., posterior \(\propto\) likelihood \(\times\) prior:

\[\begin{equation} f(\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta) \propto f(Y_x \mid \theta_x, \sigma_x^2) f(\sigma_x^2 \mid \alpha, \beta). \end{equation}\]

Individual likelihood

Substituting the likelihood and prior distributions, we have:

\[\begin{align} f(\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta) &\propto \left(\prod_{i=1}^{l_x} \frac{1}{\sqrt{2\pi\sigma_x^2}} \exp\left(-\frac{(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right)\right) \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right) \\ &\propto (\sigma_x^2)^{-\frac{l_x}{2}} \exp\left(-\frac{\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{2\sigma_x^2}\right) (\sigma_x^2)^{-(\alpha + 1)} \exp \left( -\frac{\beta}{\sigma_x^2} \right) \\ &\propto (\sigma_x^2)^{-(\alpha + \frac{l_x}{2} + 1)} \exp\left(-\frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\sigma_x^2}\right). \end{align}\]

Posterior distribution of \(\sigma_x^2\)

The posterior distribution of \(\sigma_x^2\) is proportional to the product of the likelihood and the prior, and we can recognize it as an inverse-gamma distribution with updated parameters:

\[\begin{equation} \sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta \sim \text{InvGamma}\left(\alpha + \frac{l_x}{2}, \beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\right). \end{equation}\]

The posterior parameters have intuitive interpretations:

  • The shape parameter is updated to \(\alpha + \frac{l_x}{2}\), which incorporates the number of observed simulations \(l_x\) for design \(x\).

  • The scale parameter is updated to \(\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\), which includes the sum of squared deviations of the observed simulations from the mean \(\theta_x\).

Closed-form posterior mean

The posterior mean of \(\sigma_x^2\) can be computed as:

\[\begin{equation} \mathbb{E}[\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta] = \frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\alpha + \frac{l_x}{2} - 1}, \end{equation}\]

is a weighted average of the population (design landscape) information (captured by \(\alpha\) and \(\beta\)) and the observed data (represented by the sum of squared deviations \(\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2\)).

  • Frequentist vs. Bayesian:

    Choices of \(\alpha\) and \(\beta\);

    Starting values and prior distributions;

    fixed and unknown vs. Bayesian estimation.

Multi-fidelity models

We have the complete data log-likelihood (known variances):

\[\begin{equation} \begin{split} \log L(Y, \theta, g \mid Z, \tau, \mu, \Sigma) = & \sum_{x = 1}^{k} \sum_{m = 1}^{M} Z_{xm} \left( \log \tau_m + \log \phi( (\theta_x, g_x) \mid \mu_m, \Sigma_m ) \right) \\ & + \sum_{x = 1}^{k} \sum_{i = 1}^{l_x} \log \phi(Y_{xi} \mid \theta_x, \sigma_x^2). \end{split} \end{equation}\]

Multi-fidelity models with random effects

The updated complete data log-likelihood, incorporating the random effects variances, becomes:

\[\begin{equation} \begin{split} \log L(Y, \theta, g, \sigma^2 \mid Z, \tau, \mu, \Sigma, \alpha, \beta) = & \sum_{x = 1}^{k} \sum_{m = 1}^{M} Z_{xm} \left( \log \tau_m + \log \phi( (\theta_x, g_x) \mid \mu_m, \Sigma_m ) \right) \\ & + \sum_{x = 1}^{k} \left[ \sum_{i = 1}^{l_x} \log \phi(Y_{xi} \mid \theta_x, \sigma_x^2) + \log f(\sigma_x^2 \mid \alpha, \beta) \right], \end{split} \end{equation}\]

  • Since the designs are independent and the design variances are only related to the high-fidelity data \((Y_{xi})\), we can incorporate the inverse-gamma PDF for the design variances into the joint log-likelihood for EM estimation.

Estimation

EM algorithm can work seamlessly with the random effects variances. For the random effects variances, the M-step updates for \(\alpha\) and \(\beta\) can be obtained by solving the following equations:

\[\begin{equation} \frac{\partial Q}{\partial \alpha} = k \log \beta - k \psi(\alpha) - \sum_{x = 1}^{k} \log \sigma_x^2 - \sum_{x = 1}^{k} \frac{\beta}{\sigma_x^2} = 0, \end{equation}\]

\[\begin{equation} \frac{\partial Q}{\partial \beta} = \frac{k \alpha}{\beta} - \sum_{x = 1}^{k} \frac{1}{\sigma_x^2} = 0, \end{equation}\]

where \(Q\) is the expected complete data log-likelihood and \(\psi(\cdot)\) is the digamma function

\[\begin{equation} \psi(x) = \frac{d}{dx} \log \Gamma(x) = \frac{\Gamma'(x)}{\Gamma(x)} \end{equation}\]

And

\[\begin{equation} \mathbb{E}[\sigma_x^2 \mid Y_x, \theta_x, \alpha, \beta] = \frac{\beta + \frac{1}{2}\sum_{i=1}^{l_x}(Y_{xi} - \theta_x)^2}{\alpha + \frac{l_x}{2} - 1}, \end{equation}\]

  • E-step: Update the latent membership variables \(Z\) and the posterior means and variances of the random effects variances \(\sigma_x^2\) using current parameter estimates and the observed high-fidelity data \(Y_{xi}\).

  • M-step: Update the parameters \(\tau\), \(\mu\), \(\Sigma\), \(\alpha\), and \(\beta\) by maximizing the expected complete data log-likelihood.

OCBA integration

Recall the OCBA rule1:

\[\begin{equation} \left( \frac{\beta_k}{\sigma_k} \right)^2 = \sum_{x \neq k} \left( \frac{\beta_x}{\sigma_x} \right)^2 \end{equation}\]

\[\begin{equation} \frac{(\mu_x - \mu_k)^2}{\frac{\sigma_x^2}{\beta_x} + \frac{\sigma_k^2}{\beta_k}} = \frac{(\mu_x' - \mu_k)^2}{\frac{\sigma_x'^2}{\beta_x'} + \frac{\sigma_k^2}{\beta_k}}, \quad \forall x, x' \neq k, \end{equation}\]

where \(\beta_1, \dots, \beta_k\) are proportions of the simulation budget allocated to each design.

Integrating with OCBA

  1. Generate new simulations for designs:

    In each iteration, we allocate a new batch of simulation budget across the k designs based on the current estimates.

    The allocation can be determined using the OCBA allocation rule, which would plug in the posterior variances \(\mathbb{E}[\sigma^2_x]\) and the current estimates of the design means.

  2. Streaming data update the estimation parameters, including \(\alpha\) and \(\beta\):

    Suppose we have a workable streaming data update scheme (i.e., SAEM) for the estimation of the all parameters, we could incrementally update the parameters per simulation requirement.

  3. We use the posterior variance \(\mathbb{E}[\sigma_x^2]\) and OCBA allocation rule to generate the next round of simulation budget and return to step 1.

More on convergence of \(\sigma_x^2\) estimates

  1. Starting from an initial guess or estimate of the sampling variances, can the proposed scheme has the potential to converge to the true design variances over iterations?
  1. In the multi-fidelity models paper 1, the authors have shown a budget allocation policy known as MFBA to show its PCS guarantees.